LncRNA ALMS1-IT1 is a novel prognostic biomarker and correlated with immune infiltrates in colon adenocarcinoma

Colon adenocarcinoma (COAD) is one of the most serious cancers. It is important to accurately predict prognosis and provide individualized treatment. Evidence suggests that clinicopathological features and immune status of the body are related to the occurrence and development of cancer. Expression of long non-coding RNA (LncRNA) ALMS1 intronic transcript 1 (ALMS1-IT1) is observed in some cancer types, and we believe that it may have the potential to serve as a marker of COAD. Therefore, we used the data obtained from the cancer genome atlas (TCGA) database to prove the relationship between ALMS1-IT1 and COAD. Wilcoxon rank sum test, Chi-square test, Fisher exact test and logistic regression were used to evaluate relationships between clinical-pathologic features and ALMS1-IT1 expression. Receiver operating characteristic curves were used to describe binary classifier value of ALMS1-IT1 using area under curve score. Kaplan–Meier method and Cox regression analysis were used to evaluate factors contributing to prognosis. Gene oncology (GO) and (Kyoto Encyclopedia of Genes and Genomes) KEGG enrichment analysis were used to predict the function of differentially expressed genes associated with ALMS1-IT1. Gene set enrichment analysis (GSEA) was used to predict canonical pathways associated with ALMS1-IT1.Immune infiltration analysis was performed to identify the significantly involved functions of ALMS1-IT1. Starbase database was used to predict miRNAs and RNA binding proteins (RBPs) that may interact with ALMS1-IT1. Increased ALMS1-IT1 expression in COAD was associated with N stage (P < .001), M stage (P = .003), Pathologic stage (P = .002), and Primary therapy outcome (P = .009). Receiver operating characteristic curve suggested the significant diagnostic and prognostic ability of ALMS1-IT1 (area under curve = 0.857). High ALMS1-IT1 expression predicted a poorer overall-survival (P = .005) and poorer progression-free interval (PFI) (P = .012), and ALMS1-IT1 expression was independently correlated with PFI in COAD patients (hazard ratio (HR) :1.468; 95% CI: 1.029–2.093; P =.034) (HR: 1.468; 95% CI: 1.029–2.093; P = .034). GO, KEGG, GSEA, and immune infiltration analysis showed that ALMS1-IT1 expression was correlated with regulating the function of DNA and some types of immune infiltrating cells. ALMS1-IT1 expression was significantly correlated with poor survival and immune infiltrations in COAD, and it may be a promising prognostic biomarker in COAD.


Introduction
Colorectal cancer (CRC) includes colon adenocarcinoma (COAD) and rectal adenocarcinoma, according to pathological classification, approximately 80% to 90% of CRC cases are COAD. [1] CRC ranks third in terms of incidence (10.2% of total cases) and is the second leading cause of cancer-related death globally (9.2% of all cases). [2] In the past few decades, the incidence and mortality of CRC have steadily declined globally, however, CRC is still the most common gastrointestinal malignancy and the second largest cancer-related disease cause of death. [3] The use of chemotherapy and surgical resection for malignant CRC is increasing, but the effect of these treatments has not been significantly improved. About half of CRC recur and patients die within 5 years. [4,5] Therefore, it is necessary to identify new diagnostic, prognostic biomarkers and therapeutic targets, as well as to study the potential molecular mechanisms of COAD. Good diagnostic and prognostic biomarkers should be closely related to the prognosis of patients and easy to detect. Medicine Encouragingly, a large amount of evidence indicates that the regulatory role of long non-coding RNAs (LncRNAs) is related to the development and progression of a variety of cancers. [6] LncRNAs are ≥200 nucleotides in length and do not encode proteins. According to its position and background in the genome, lncRNA can be divided into 5 main types: intergenic lncRNAs, intragenic lncRNAs, bidirectional lncRNAs, sense lncRNAs and antisense lncRNAs. [7] The mechanisms of lncRNA regulating gene expression mainly include transcriptional repression, RNA-DNA interaction (chromatin remodeling), nuclear RNA-RNA interaction and cytoplasmic RNA-RNA interaction. Their functions are to regulate a series of cellular biological processes, including chromatin remodeling, transcriptional and post-transcriptional events. [8,9] The most recognized molecular mechanism of lncRNAs is to act as a miRNA "sponge" to regulate downstream target genes. [10,11] LncRNAs is abnormally expressed in various types of cancer cells and plays an important role in several common hallmarks of cancer. [12] In addition, a growing number of studies indicate that lncRNAs may be identified as novel biomarkers for diagnosis, prognosis and metastasis prediction in various cancers. [13][14][15] These year, experiments have demonstrated that several lncRNAs are CRC-specific lncRNAs, such as PCAT-1, RP11-462C24.1, HOTAIR, and MALAT1 as candidate diagnostic biomarkers. [16][17][18] ALMS1-IT1,officical full name is ALMS1 intronic transcript 1. Up to now, there are few studies on ALMS1-IT1, and some studies believe that ALMS1-IT1 has prognostic value. [19] Recent studies have predicted that the expression of ALMS1-IT1 may be related to ferroptosis. [20] Bioinformatics analysis predicts that ALMS1-IT1 can serve as a prognostic biomarker for Head and neck squamous cell carcinoma (HNSCC). [21] Experiments have shown that ALMS1-IT1/AVL9 promotes the malignant progression of lung adenocarcinoma (LUAD) in part by regulating the cyclin-dependent kinase pathway. [22] Based on previous research results, we believe that ALMS1-IT1 may play an important role in the occurrence and development of COAD. Meanwhile, the role of ALMS1-IT1 in COAD has not been reported. Hence, in this research, we used the COAD RNA-seq data in the cancer genome atlas (TCGA) database to compare the difference of ALMS1-IT1 expression between tumor tissues and normal samples, and investigated the correlation between ALMS1-IT1 expression levels and clinical pathological features of COAD. Next, we evaluated the prognostic value of ALMS1-IT1 in COAD. In addition, gene oncology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and gene set enrichment analysis (GSEA) were performed on the high and low expression groups of ALMS1-IT1 to reveal its possible functions. Meanwhile, Starbase database was Figure 1. (A) Wilcoxon rank sum test was used to analyze the difference expression of ALMS1-IT1 in COAD tissues and adjacent colon tissues. (B) Wilcoxon signed rank sum test was used to detect the difference expression of ALMS1-IT1 in COAD tissues and adjacent colon tissues. (C) Wilcoxon rank sum test was used to analyze the difference expression of ALMS1-IT1 in normal colon tissues of GTEx combined with TCGA and COAD tissues of TCGA. (D) ROC curve showed the efficiency of ALMS1-IT1 expression level to distinguishing COAD tissue from non-tumor tissue. X-axis represents false positive rate, and Y-axis represents true positive rate. ALMS1-IT1 = ALMS1 intronic transcript 1, COAD = colon adenocarcinoma, GTEx = genotype-tissue expression, ROC = receiver operating characteristic, TCGA = the cancer genome atlas.
used to predict miRNAs and RNA binding proteins (RBPs) that may interact with ALMS1-IT1. Finally, by analyzing the correlation between ALMS1-IT1 expression and immune infiltration, we comprehensively explored and discussed the potential mechanism of ALMS1-IT1 regulating the occurrence and development of COAD.

RNA-sequencing data and bioinformatics analysis
We used TCGA database (https://portal.gdc.cancer.gov/) to collect RNA-seq data and clinical information from 521 cases of COAD projects, including 41 cases with matched adjacent tissues. The downloaded data format was level 3 HTSeq-fragments per kilobase per million and then was converted into transcripts per million format for subsequent analysis. We also download transcripts per million format RNA-seq data in TCGA and Genotype-Tissue Expression database that uniformly processed by Toil process from University of California Santa Cruz Xena (https://xenabrowser. net/datapages/). [23] All procedures performed in this study were in accordance with the Declaration of Helsinki (as revised in 2013). We used R package (DESeq2) [24] to go differential analysis of ALMS1-IT1 expression, adjusted P value < 0.05 and |logFC| > 2 were consider as cut off criteria, the Different Expression Genes (DEGs) obtained were used for GO, KEGG analysis, adjusted P value <.05 were consider as another cut off criteria, the DEGs obtained were used for GSEA.
The R(version 3.6.0) package org. Hs.e.g..db(3.10.0) was used to conversion gene ID, cluster Profiler(3.14.3) was used to perform GO, KEGG, and GSEA between high-and low-ALMS1-IT1 groups. [25,26] According to the default statistical method, the process was repeated 1000 times for each analysis and selected c2.cp. v7.2.symbols.gmt in MSigDB Collections as the reference gene collection, false discovery rate q-value < 0.25 and adjusted P adjust <.05 were considered to be significantly enriched.

Immune infiltration analysis by ssGSEA
The immune infiltration analysis of COAD was performed by single sample GSEA (ssGSEA) method from R (v.3.6.3) package GSVA (version 1.34.0), [27] and we quantified the infiltration levels of 24 immune cell types from gene expression profile in the literature. [28] In order to discover the correlation between ALMS1-IT1 and the infiltration levels of 24 immune cells, P values were determined by the Pearson and Wilcoxon rank sum test.

Statistical analysis
All statistical analyses were performed using R(v.3.6.3). Wilcoxon rank sum test, chi square test, Fisher exact test and logistic regression were used to analyze the relationship between clinical pathologic features and ALMS1-IT1. Kaplan-Meier method was used to calculate the overall survival rate and progression-free interval (PFI) of COAD patients from TCGA. Univariate and multivariate analysis were performed to estimate the association between clinical and genetic clinical characteristics and PFI using Cox proportional hazard models. P values <.05 were considered statistically significant.

Ethical approval
This study does not involve experiments that require ethical approval.

ALMS1-IT1 expression is correlated with poor clinicopathological features of COAD
In order to identify the difference of ALMS1-IT1 expression between COAD and normal tissues, we analyzed the expression level of ALMS1-IT1 in 480 COAD tissues and 41 adjacent normal colon tissues, and found that ALMS1-IT1 was highly expressed in COAD tissues (P < .001, Fig. 1A). Meanwhile, we also analyzed the expression of ALMS1-IT1 in 41 COAD tissues and their matched adjacent tissues. The results indicated that COAD tissues highly expressed ALMS1-IT1 (P < .001, Fig. 1B). Moreover, the expression of ALMS1-IT1 in normal samples of Genotype-Tissue Expression combined TCGA database and COAD samples of TCGA database was compared. We also found that ALMS1-IT1 was significantly overexpressed in COAD samples (P = .037, Fig. 1C). In addition, we used the receiver operating characteristic curve to analyze the effectiveness of ALMS1-IT1 expression level to distinguish COAD tissues from non-tumor tissues. The area under curve of ALMS1-IT1 was 0.857, suggesting that ALMS1-IT1 could be served as an ideal biomarker to distinguish COAD from non-tumor tissue (Fig. 1D).
The characteristics of patients were shown in Table 1, in which 478 primary COAD with both clinical and gene expression data were collected from TCGA database. According to the mean value of relative ALMS1-IT1 expression, the patients with COAD were divided into high (n = 239) and low (n = 239) expression groups. The association between the expression level of ALMS1-IT1 and the clinicopathological characteristics of COAD patients was evaluated. Chi-square test or Fisher's exact test revealed that ALMS1-IT1 expression was associated with N stage (P < .001), Gleason score (P = .002), primary therapy outcome (P = .001) and residual tumor (P < .001). Logistic regression method was also used to show the relationship between the clinicopathological characteristics of COAD and expression level of ALMS1-IT1. The results suggested that ALMS1-IT1 was significantly related to N stage (P < .001), M stage (P = .003), Pathologic stage (P = .002), and Primary therapy outcome (P = .009).
Logistic regression method was also used to show the relationship between the clinicopathological characteristics of COAD and expression level of ALMS1-IT1. The results suggested that ALMS1-IT1 was significantly related to N stage (P < .001), M stage (P = .002), Pathologic stage (P < .001), primary therapy outcome (P < .001) ( Table 2 and Fig. 2).

ALMS1-IT1 expression is correlated with poor prognosis of patients with COAD
The association between ALMS1-IT1 expression and OS or PFI of patients with COAD was evaluated by Kaplan-Meier analysis, which indicated that expression of ALMS1-IT1 is positively correlated with poor OS (P = .005, Fig. 3A) and poor PFI (P = .012, Fig. 3B) of COAD patients.    Table 3 REACTOME terms enriched in high-and low-ALMS1-IT1 groups by using GSEA.

ALMS1-IT1 related GO and KEGG analysis
To estimate the potential functions of DEGs in high-risk versus (vs) low-risk groups, we identify DEGs of ALMS1-IT1 in TCGA-COAD data under cutoff criteria of adjusted P value <.05 and |logFC|>2.KEGG pathway and GO annotation were performed by R package clusterProfiler(3.14.3).GO reveals the catalogs of biological process, cellular component, and molecular function. After multiple-test correction, KEGG pathways and GO terms with corrected P (P.adjust) value <.05 were considered to be prominently enriched in DEGs. We selected top 5 of the lowest adj. P value of GO and KEGG pathway enrichment analysis of 3303 DEGs related to ALMS1-IT1 in TCGA-COAD data ( Table 4 and Fig. 5).

The correlation between ALMS1-IT1 expression and immune infiltration
We further analyzed the correlation between expression of ALMS1-IT1 and immune infiltration by ssGSEA with Pearson.
The results showed that the expression of ALMS1-IT1 was negatively correlated with most immune cells, and the top 3 negative correlation coefficients were natural killer (NK) cells, immature dendritic cell (iDC) and NK CD56bright cells (P < .001, Fig. 6).

Target miRNA and protein prediction
Thirty-nine miRNAs and 42 RBPs that may interact with ALMS1-IT1 were identified using Starbase database (Fig. 7).

Discussion
In this study, the expression of LncRNA ALMS1-IT1 in COAD and its correlation with COAD diagnosis and prognosis were explored. In general, lncRNAs exert regulatory functions at different levels of gene expression, including chromatin modification, transcription, and post-transcription. [29] lncRNAs can interact with chromatin remodeling complexes to induce heterochromatin formation at specific genomic sites and reduce gene expression. In addition, lncRNAs interact with RNAbinding proteins and transcription factor co-activators, or regulate transcription by regulating the main promoters of their target genes. Mechanically, LncRNAs can communicate with DNA, mRNAs, ncRNAs and proteins and play cancer-related regulatory roles, such as signals, decoys, scaffolds and guidelines. [30,31] In addition, lncRNAs were often involved in different stages of CRC, from precancerous polyps to distant metastasis, which can be regarded as potential effective diagnostic biomarkers. [32,33] ALMS1-IT1 is a recently discovered lncRNA, which has been shown to play a key role in regulating tumor progression and predicting the survival time of tumor patients. [34] Luan's The Wilcoxon rank sum test was used to analyze the difference of iDC infiltration level between ALMS1-IT1 high and low expression groups. (F) The correlation between ALMS1-IT1 expression and NK CD56bright cells was detected by Pearson correlation method. (G) The Wilcoxon rank sum test was used to analyze the difference of NK CD56bright cells infiltration level between ALMS1-IT1 high and low expression groups. ALMS1-IT1 = ALMS1 intronic transcript 1, iDC = immature dendritic cells, Tcm = T central memory, Tem = T effector memory, Tgd = T gamma delta, Tfh = T follicular helper, NK = natural killer, pDCs = plasmacytoid dendritic cells. study points out that ALMS1-IT1 was highly expressed in LUAD, and the high expression of ALMS1-IT1 lead to poor prognosis in LUAD patients. Importantly, overexpression of ALMS1-IT1 helps to promote the viability of LUAD cells in vitro. [22] Lei Y's [35] reveals the significance of the interaction between lncRNAs and ceRNAs in small cell lung cancer, indicating that the integration of expression profiles and alternative splicing can be used to identify biomarkers and potential pathological changes, and ALMS1-IT1 is one of the critical gene. Lu Xing et al [21] reported that ALMS1-IT1 is up-regulated in high-risk groups of head and neck squamous cell carcinoma (HNSCC), which is related to the poor prognosis of HNSCC patients. In addition, it was also found that ALMS1-IT1 is a lncRNA targeting most miRNAs and proteins in HNSCC. All these studies suggest that ALMS1-IT1 may play different roles in various cancer types.
The present study demonstrated the elevated level of ALMS1-IT1 in COAD tissues, which is associated with poor patient outcome. A highlight of this work is to predict the potential mechanisms by which ALMS1-IT1 regulates the development of COAD. Through GO and KEGG, ALMS1-IT1 related gene were found to be involved in nucleosome assembly, chromatin assembly and DNA complex formation, indicating that ALMS1-IT1 may play a role in cell replication. Through GSEA,ALMS1-IT1 was found related in DNA methylation and histone methylation, indicating that ALMS1-IT1 may play a role in the maintenance of cell metabolism and nucleic acid modification and protein modification.
Another important aspect of this study is to investigate the relationship between ALMS1-IT1 expression and diverse immune infiltration levels in COAD. Our results revealed a moderate relationship between ALMS1-IT1 expression and infiltration level of NK cells, iDC and NK CD56 bright cells in COAD.
These correlations could be indicative of a potential mechanism by which ALMS1-IT1 inhibits the function of NK cells, NK CD56 bright cells and iDC, subsequently promotes the function Figure 7. To investigate the lncRNA-miRNA interaction network regulated by ALMS1-IT1, the 39 target miRNAs were identified using Starbase database (A). As for the lncRNA-protein network, the Starbase database was searched and the results revealed that there were 42 RBPs that interacted with ALMS1-IT1 (B). ALMS1-IT1 = ALMS1 intronic transcript 1, lncRNA = long non-coding RNA, RBPs = RNA binding proteins. Medicine of T central memory, and thus exerts its inhibitory effect on COAD. To our knowledge, despite some limitations, this is the first work to explore the relationship between ALMS1-IT1 and COAD. First of all, the current research is mainly based on bioinformatics analysis, which can be further strengthened through experimental research. Second, the number of healthy subjects as controls is very different from the number of cancer patients. Finally, retrospective research still has its limitations, especially the inconsistent intervention measures and lack of relevant information. Therefore, follow-up studies are needed to further verify our findings.

Conclusions
Collectively, we observed increased ALMS1-IT1 in COAD, which was also related to poor OS and poor PFI. Moreover, ALMS1-IT1 might participate in the development of COAD via affecting the function of DNA and immune infiltrating cells. The current study partially unveiled the roles of ALMS1-IT1 in COAD and provided a potential biomarker for the diagnosis and prognosis of COAD. This study was supported by Xiamen medical and health guidance project (No.3502Z20209072).Xiantao Academic provides technical support for R analysis.